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Abstract 

Experimental  data  analysis  is  an  key  activity  in  metrology,  the  science  of  measurement. 
It  involves  developing  a mathematical  model  of  the  physical  system  in  terms  of  mathem- 
atical equations  involving  parameters  that  describe  all  the  relevant  aspects  of  the  system. 
The  model  specifies  how  the  system  is  expected  to  respond  to  input  data  and  the  nature 
of  the  uncertainties  in  the  inputs.  Given  measurement  data,  estimates  of  the  model  para- 
meters are  determined  by  solving  the  mathematical  equations  constructed  as  part  of 
the  model,  and  this  requires  developing  an  algorithm  (or  estimator)  to  determine  values 
for  the  parameters  that  best  explain  the  data.  In  many  cases,  the  parameter  estimates 
are  given  by  the  solution  of  a least-squares  problem.  This  paper  discusses  how  various 
uncertainty  structures  associated  with  the  measurement  data  can  be  taken  into  consider- 
ation and  describes  the  algorithms  used  to  solve  the  resulting  regression  problems.  Two 
applications  from  NPL  are  described  which  require  the  solution  of  generalised  distance 
regression  problems:  the  use  of  measurements  of  primary  standard  natural  gas  mixtures 
to  estimate  the  composition  of  a new  natural  gas  mixture,  and  the  analysis  of  calibration 
data  to  estimate  the  effective  area  of  a pressure  balance. 


1 Introduction 

Many  metrology  experiments  involve  determining  the  behaviour  of  a response  variable  y 
as  a function  of  a set  of  independent  variables  x = {x\,X2, . . . ,xn}.  Model  building  in- 
volves establishing  the  functional  relationship  between  these  quantities,  usually  involving 
a set  of  model  parameters  a,  i.e., 

V*  = 0(x*,a), 

where  y*  and  x*  represent  exact  values  of  the  variables.  The  terms  a parametrize  the 
range  of  possible  response  behaviour  and  the  actual  behaviour  is  specified  by  determ- 
ining values  for  these  parameters  from  measurement  data.  In  practice,  measurements 
are  subject  to  error,  and  the  error  structure  must  be  taken  into  account  firstly  in  order 
to  determine  effective  methods  for  obtaining  parameter  estimates  and  secondly  in  de- 
termining the  uncertainty  in  the  fitted  model  parameters.  For  a set  of  measurement  data 
{Xj,  yi}iLi,  the  data  analysis  problem  involves  the  accurate  estimation  of  the  parameters 
a,  taking  into  account  knowledge  of  the  uncertainties  in  {x,}  and/or  {y,},  and  typically 
leads  to  a least-squares  problem  [4] . 

This  paper  describes  the  various  uncertainty  structures  that  arise  and  corresponding 
regressions  problems  for  determining  estimates  of  the  model  parameters.  If  the  covari- 
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ance  information  associated  with  the  measurements  is  structured  so  that  only  the  ith  set 
of  measurement  errors  are  correlated  with  each  other,  a generalised  distance  regression 
approach  is  appropriate.  However,  some  applications  have  quite  general  correlation  struc- 
ture and  a full  Gauss-Markov  estimation  approach  is  required  to  make  efficient  use  of 
the  statistical  model  [7].  This  leads  to  a generalised  Gauss-Markov  regression  problem  to 
take  into  account  the  errors  in  the  variables  and  the  general  correlation  structure.  While 
the  covariance  structure  may  dictate  which  solution  algorithms  are  to  be  employed,  the 
information  required  of  the  model  function  <j>  is  limited  to  the  evaluation  of  the  function 
and  its  derivatives  with  respect  to  a and  x.  This  means  that  solution  algorithms  can 
be  based  on  a compact  set  of  model-dependent  modules  and  a generic  set  of  harnessing 
routines  that  link  the  models  to  general  purpose  least-squares  optimisation  software. 

The  layout  of  the  paper  is  as  follows.  In  Section  2 we  consider  the  various  error  struc- 
tures and  corresponding  regression  problems.  Section  3 introduces  two  measurement 
problems  encountered  at  NPL:  the  use  of  measurements  of  primary  standard  natural 
gas  mixtures  to  estimate  the  composition  of  a new  natural  gas  mixture;  and  the  ana- 
lysis of  calibration  data  to  estimate  the  effective  area  of  a pressure  balance.  Although 
the  functional  models  for  these  measurement  systems  are  simple,  taking  the  form  of 
low-order  polynomials,  the  statistical  models  need  to  account  for  (a)  uncertainties  in 
both  the  dependent  and  independent  variables,  and  (b)  possible  correlations  between 
measurements.  These  requirements  lead  us  to  solve  generalised  regression  problems.  An 
overview  of  solution  algorithms  for  the  various  problems  is  given  in  Section  4.  Concluding 
remarks  are  made  in  Section  5. 

2 Error  structures  and  regression  problems 

Within  metrology,  various  error  structures  arise  all  of  which  can  be  taken  into  account. 
We  now  consider  the  main  types. 

2.1  Error  in  one  variable  only 

2.1.1  Ordinary  (weighted)  least  squares 

The  simplest  type  of  error  structure  occurs  when  only  one  of  the  system  variables  is 
subject  to  error  and  there  is  no  correlation  between  errors.  The  model  is  summarised  by 

y*  = <KX*,  a),  Vi  = y*i  + eu  Xj  = X* , 
where  it  is  assumed  that 

E(ei)  = 0,  var(ej)  = erf,  cov(ei,e;?)  = 0,  % ^ j.  (2.1) 

Good  estimates  of  a can  be  found  by  solving 

m 

mm  Y^wibJi  ~ a)]2, 

i= 1 


where  = 1 / <7* , i = 1, . . . , m. 
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2.1.2  Gauss-Markov  regression 

If  instead  of  (2.1),  the  measurement  errors  are  correlated  so  that 

E(e)  = 0,  var(e)  = V, 

with  V full  rank,  then  an  estimate  of  a can  be  found  by  solving 

min[y-0(a)]Ty_1[y-0(a)],  (2.2) 

a 

where  the  ith  element  of  0(a)  is  </>(xn  a). 

2.2  Errors  in  more  than  one  variable 

In  many  metrological  applications  more  than  one  of  the  measured  variables  is  subject  to 
error,  and  this  must  be  taken  into  account  in  order  to  determine  estimates  of  the  model 
parameters  which  are  statistically  efficient  and  free  from  major  bias. 

2.2.1  Orthogonal  distance  regression 

The  simplest  case  arises  when  the  covariance  matrix  associated  with  the  ith  set  of  meas- 
urements is  a multiple  of  the  identity  matrix  and  there  is  no  correlation  between  any  of 
the  errors,  summarised  by  the  model 

y*  = 0(x*,a),  yi  = y*  + e,;,  x*  = x*  + 8U 

with 

£(*?<)  = 0,  varfo)  = pfl,  (2.3) 

where  rji  = (ei,Sj)T.  In  this  case,  appropriate  estimates  of  the  parameters  are  determ- 
ined by  the  solution  of 

m 

min  X>?{(xi  - x*)T(Xj  - x*)  + (y4  - </>(x*,  a))2}, 

{x'ha  7TT 


where  Vi  = 1/pi,  i = 1, . . . ,m. 

Note  that  this  optimisation  problem  involves  m sets  of  parameters  x*  as  well  as  the 
parameters  a specifying  the  model  y — 0(x,  a) . 

2.2.2  Generalised  distance  regression 


If  we  assume  that  the  errors  r)L  are  correlated  with  var(?7i)  = Vi  with  V,  full  rank,  but 
that  cov(r)i,T)j)  = 0,  i ^ j,  then  the  appropriate  regression  problem  is 


min 

(x*}.a 


E 


Vi  - 0(x*,a) 

x,:  - x* 


Vi  — 0(x* , a) 

Xi  - X* 


(2.4) 


2.2.3  Generalised  Gauss-Markov  regression 

The  most  complicated  error  structure  arises  when  all  variables  are  subject  to  measure- 
ment error  and  there  is  general  correlation  between  the  errors.  If  £ (£*)  is  the  vector  of 
measurements  {x,}  (variables  {x*}),  then  the  corresponding  regression  problem  is 


min 

£>« 


’ y - 0(1,  a)  " 

T 

v~l 

' y-0(C,a)  ’ 

€-€* 

V 

i-v 

(2.5) 
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where  the  ith  element  of  </>(£, a)  is  </>(x*,  a). 

3 Examples  from  metrology 

3.1  Preparation  of  primary  standard  natural  gas  mixtures 

Within  the  Centre  for  Optical  and  Analytical  Measurement  at  NPL,  one  part  of  the 
work  of  the  Environmental  Standards  Group  is  to  prepare  primary  standard  natural  gas 
mixtures.  These  are  cylinders  containing  natural  gas  prepared  gravimetrically  to  con- 
tain known  compositions  of  each  of  the  11  constituent  components  (methane,  ethane, 
propane,  1-butane,  n-butane,  1-pentane,  n-pentane,  neo-pentane,  hexane,  nitrogen  and 
carbon  dioxide).  Mixtures  are  prepared  to  cover  various  concentration  ranges,  e.g.,  meth- 
ane: 64%  — 98%.  These  primary  standard  mixtures  are  used  as  the  basis  for  determining 
the  composition  of  a new  mixture  and  hence  its  calorific  value. 

Given  a number  of  primary  standard  natural  gas  mixtures  containing  known  con- 
centrations of  one  of  the  constituent  components  (e.g.,  CO2),  the  detector  response  for 
each  mixture  and  the  detector  response  for  the  new  mixture,  we  wish  to  determine  the 
concentration  of  CO2  in  the  new  mixture. 

An  approach  to  solving  this  problem  is  firstly  to  use  the  calibration  data  (relating  to 
the  primary  gas  mixtures)  to  calibrate  the  detector  and,  secondly,  to  use  the  calibration 
curve  so  constructed  with  the  new  measurement  to  predict  the  concentration  in  the  new 
mixture. 

Errors  to  be  accounted  for  are: 

• the  calibration  data  is  known  inexactly.  The  process  of  preparing  the  primary  stand- 
ards means  that  they  are  known  inexactly,  and  indeed  the  errors  in  the  standards 
may  be  correlated  (this  is  a consequence  of  the  gravimetric  process  used  to  prepare 
the  standard  mixtures  which  involves  comparing  on  a balance  each  standard  mix- 
ture at  each  stage  of  preparation  against  calibrated  masses  selected  from  a common 
set  of  masses), 

• the  data  returned  by  the  detector  (which  is  based  on  the  analytical  technique  of 
chromatography)  is  subject  to  measurement  error. 

Consequently,  we  wish  our  data  analysis  to  account  for  the  inexactness  of  the  meas- 
urement data  and  to  quantify  the  resulting  uncertainty  associated  with  the  final  meas- 
urement result. 

Figure  1 shows  a sample  set  of  measurement  data,  with  the  ellipses  around  the  calib- 
ration points  illustrating  the  errors  in  the  concentrations  and  detector  responses.  (The 
error  ellipses  have  been  magnified  greatly  for  illustrative  purposes.)  The  figure  also  shows 
a calibration  curve  which  is  used  to  estimate  the  concentration  of  the  component  for 
which  the  detector  response  (and  its  uncertainty)  is  known. 

3.2  Calibration  of  pressure  balances 

The  principal  role  of  the  Pressure  and  Vacuum  Section  in  the  Centre  for  Mechanical  and 
Acoustical  Metrology  at  NPL  is  the  development  and  maintenance  of  primary  measure- 
ment standards  for  pressure  and  vacuum  and  their  dissemination  to  industry.  Pressure 
balances  are  pressure  generators  and  consist  essentially  of  finely-machined  pistons  moun- 
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Fig.  1.  Sample  data  (+),  fitted  calibration  curve  and  predicted  measurement  (o). 

ted  vertically  in  close-fitting  cylinders.  The  pressure  required  to  support  a piston  and 
associated  ring-weights  depends  on  the  mass  of  the  piston  and  ring-weights  and  the 
cross-sectional  area  of  the  piston  [5].  Due  to  various  fluid  dymamic  effects,  the  effective 
area  A(p,  a)  of  the  piston-cylinder  assembly  is  a function  of  pressure,  usually  taken  to 
be  a linear  function  A(p,  a)  = a\  + a2p.  Many  other  factors  such  as  temperature  and  air 
buoyancy  have  to  be  taken  into  account  but  for  our  purposes  here,  the  pressure  generated 
satisfies 

aip  + a2p 2 = y{m), 

where  a are  the  instrument  parameters  and  y(m ) is  a simple  function  of  the  applied  load 
m.  This  equation  determines  p implicitly  as  a function  of  m and  a.  Suppose  a reference 
pressure  balance  has  been  calibrated  so  that  estimates  of  the  instrument  parameters 
a and  their  uncertainties  are  known.  The  reference  balance  can  be  used  to  calibrate  a 
test  balance  in  a cross-floating  experiment  in  the  following  way.  A load  m*  is  applied 
to  the  reference  balance  to  generate  pressure  p,  — p(mt,  a).  A load  nl  is  applied  to  the 
test  balance  so  that  the  pressures  generated  are  matched.  The  test  calibration  curve  is 
determined  from  a best  fit  to  the  data  (n^p^) 

biP*  + b2(p*)2  = y(n*),  Pi=p*+ei,  m = n*+€i, 

where  <5*  and  e*  represent  measurement  error  associated  with  the  pressures  and  masses, 
respectively.  However,  the  following  must  be  taken  into  account.  Firstly,  the  pressures 
Pi  all  depend  on  the  common  estimates  a of  the  instrument  parameters  of  the  reference 
balance,  leading  to  correlation  of  the  measurement  errors  6,  . Secondly,  the  masses  n; 
and  rrii  are  made  up  from  the  same  ensemble  of  masses  p = (jii , . . . , Pm)t  so  that 

ft;  = njp,  ml  = mjp, 


Generalised  Gauss-Markov  regression 


275 


where  n,  and  mt  are  binary  coefficient  vectors.  This  means  that  measurement  errors 
associated  with  the  masses  pk  give  rise  to  (further)  correlation  between  Si  and  e*.  Taking 
this  general  correlation  into  account,  estimates  of  the  the  instrument  parameters  b,  are 
found  from  solving 

min 

b,p* 


y -<t> 

1—1 

1 

H 

-e-* 

1 

>> 

1 

p-p* 

! 

p-p 

(3.1) 


where  the  ith  elements  of  </>  and  y are  bip*  + 62 (p*) 2 and  y(rii),  respectively,  and  V is 
the  appropriate  covariance  matrix  determined  from  the  dependence  of  y and  0 on  a and 
fi.  This  is  a generalised  Gauss-Markov  regression  problem. 


4 Algorithms  for  generalised  regression 

Algorithms  for  ordinary  least  squares  problems  of  the  form  mina  /2(a)  are  well  known 
and  include  QR  factorisation  methods  for  linear  models  or  the  Gauss-Newton  algorithm 
for  non-linear  models;  see,  e.g.,  [2,  6].  The  latter  algorithm  requires  the  user  to  supply  a 
software  module  to  evaluate  the  vector  of  function  values  f(a)  and  the  Jacobian  matrix 
J of  partial  derivatives 


dfj 

ddj' 


If  /j(a)  = Pi  — r/>(x,,  a)  as  considered  above,  the  user  has  to  supply  a module  to  calculate 
0(x,  a)  and  d<p/daj. 

If  V is  symmetric  and  strictly  positive  definite,  the  Gauss-Markov  regression  problem 
(2.2)  can  be  formulated  as  an  ordinary  least  squares  problem.  If  V — LL  r where  L is 
lower-triangular,  then  the  problem  becomes 

min/?  (a), 
a 


where  f = L-1f.  The  associated  Jacobian  matrix  is  J = L~lJ.  If  the  matrix  V is 
well-conditioned,  matrix  operations  with  V or  L~l  should  not  lead  to  unnecessary  loss 
of  precision.  However,  explicit  calculations  involving  V can  be  avoided  by  using  the 
generalised  QR  factorisation  [2,  8,  9],  leading  to  solution  algorithms  with  good  numerical 
properties. 

The  generalised  distance  regression  problem  (2.4)  can  be  solved  efficiently  by  making 
use  of  the  fact  that  the  parameters  x*  appear  only  in  the  ith  summand.  The  associated 
Jacobian  matrix  has  a block-angular  structure  that  can  be  exploited  effectively  in  the 
QR  factorisation  stage  [2,  3].  Alternatively,  a separation-of- variables  approach  can  be 
adopted  in  which  the  parameters  x*  (a)  are  first  determined  as  functions  of  a specified 
by  the  solution  of  the  corresponding  footpoint  problem 


min 


Vi  - <«x*,a) 
x,-  — x* 


V, 


-1 


Pi-0(x*,a) 

X;  — X? 


and  the  problem  formulated  as  a non-linear  least  squares  problem  in  a [1,  4].  Either  ap- 
proach yields  an  algorithm  requiring  0(mn2)  flops  while  a full  matrix  approach  requires 
0(m3)  flops. 


The  generalised  Gauss  Markov  problem  (2.5)  can  be  solved  as  a Gauss-Markov  prob- 
lem problem  in  the  variables  {x*}  and  a,  but  ideally,  we  would  like  to  develop  algorithms 
that  exploit  problem  structure  as  in  generalised  distance  regression  algorithms.  In  partic- 
ular, while  the  covariance  matrix  V may  well  be  full,  in  many  situations  it  is  constructed 
from  smaller  matrices  and  for  which  more  efficient  algorithms  could  be  developed. 

From  the  user’s  point  of  view,  all  the  regression  algorithms  discussed  here  require  only 
the  calculation  of  the  model  function  <j>  and  its  derivatives  and  Thus,  a wide 
range  of  regression  problems  can  be  solved  using  standard  optimisation  modules  along 
with  generic  harness  modules  that  perform  the  conversion  without  input  from  the  user 
over  and  above  the  calculation  of  <j)  and  its  derivatives.  For  example,  we  have  implemented 
a generalised  Gauss-Markov  solver  to  solve  problems  such  as  (3.1)  for  any  explicit  model 
y = 4>(x,  a).  However,  issues  of  efficiency  and  numerical  stability  need  to  be  taken  into 
account.  As  part  of  the  UK  Department  of  Trade  and  Industry’s  Software  Support  for 
Metrology  programme,  NPL  is  developing  and  making  available  to  metrologists  a suite  of 
routines  for  the  generalised  regression  problems  discussed  above.  By  combining  structure 
exploiting  linear  algebra  and  numerically  stable  components  such  as  the  orthogonal 
factorisation,  it  is  hoped  that  metrologists  will  be  able  to  use  these  routines  with  the 
same  confidence  and  effectiveness  that  they  currently  experience  with  standard,  well- 
engineered  regression  modules  available  in  numerical  libraries. 

5 Concluding  remarks 

In  metrology,  we  are  interested  in  the  determination  of  accurate  estimates  of  the  paramet- 
ers that  describe  a physical  process.  It  is  imperative  that  knowledge  of  the  measurement 
system  should  be  used  to  describe  the  error  structure  as  accurately  as  possible.  We  have 
described  the  five  types  of  regression  problems  that  can  occur  in  metrology  depending 
on  the  error  structures  that  are  assumed.  In  all  cases  it  is  important  that  we  employ 
efficient,  numerically  stable  algorithms  and  exploit  any  structure  in  both  the  Jacobian 
and  covariance  matrices. 
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